Decision Tree based Supervised Word Sense Disambiguation for Assamese

نویسندگان

  • Jumi Sarmah
  • Shikhar Kr. Sarma
چکیده

Word Sense Disambiguation (WSD) aims to disambiguate the words which have multiple sense in a context automatically. Sense denotes the meaning of a word and the words which have various meanings in a context are referred as ambiguous words. WSD is vital in many important Natural Language Processing tasks like MT, IR, TC, SP etc. This research paper attempts to propose a supervised Machine Learning approachDecision Tree for Word Sense Disambiguation task in Assamese language. A Decision Tree is decision model flow-chart like tree structure where each internal node denotes a test, each branch represents result of a test and each leaf holds a sense label. J48 a Java implementation of C4.5 decision tree algorithm is taken for experimentation in our case. A few polysemous words with different real occurrences in Assamese text with manual sense annotation was collected as the training and test dataset. DT algorithm produces average F-measure of .611 when 10-fold crossvalidation evaluation was performed on 10 Assamese ambiguous words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Decision Lists for Word Sense Disambiguation

This paper describes a supervised algorithm for word sense disambigua-tion based on hierarchies of decision lists. This algorithm supports a useful degree of conditional branching while minimizing the training data fragmentation typical of decision trees. Classiications are based on a rich set of collocational, morphological and syntactic contextual features, extracted automatically from traini...

متن کامل

An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation

In this paper, we evaluate a variety of knowledge sources and supervised learning algorithms for word sense disambiguation on SENSEVAL-2 and SENSEVAL-1 data. Our knowledge sources include the part-of-speech of neighboring words, single words in the surrounding context, local collocations, and syntactic relations. The learning algorithms evaluated include Support Vector Machines (SVM), Naive Bay...

متن کامل

A Comparative Study of Support Vector Machines Applied to the Supervised Word Sense Disambiguation Problem in the Medical Domain

We have applied five supervised learning approaches to word sense disambiguation in the medical domain. Our objective is to evaluate Support Vector Machines (SVMs) in comparison with other well known supervised learning algorithms including the näıve Bayes classifier, C4.5 decision trees, decision lists and boosting approaches. Based on these results we introduce further refinements of these ap...

متن کامل

A Decision Tree of Bigrams is an Accurate Predictor of Word Sense

This paper presents a corpus-based approach to word sense disambiguation where a decision tree assigns a sense to an ambiguous word based on the bigrams that occur nearby. This approach is evaluated using the sense-tagged corpora from the 1998 SENSEVAL word sense disambiguation exercise. It is more accurate than the average results reported for 30 of 36 words, and is more accurate than the best...

متن کامل

A New Supervised Learning Algorithm for Word Sense Disambiguation

The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to nd a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of models is generated that consists of the best{{tting model at each level of model complexity. The Nai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016